Research on Online Topic Evolutionary Pattern Mining in Text Streams

نویسندگان

  • Qian Chen
  • Zhiguo Gui
  • Xin Guo
چکیده

Text Streams are a class of ubiquitous data that came in over time and are extraordinary large in scale that we often lose track of. Basically, text streams forms the fundamental source of information that can be used to detect semantic topic which individuals and organizations are interested in as well as detect burst events within communities. Thus, intelligent system that can automatically extract interesting temporal pattern from text streams is terribly needed; however, Evolutionary Pattern Mining is not well addressed in previous work. In this paper, we start a tentative research on topic evolutionary pattern mining system by discussing fully properties of a topic after formally definition, as well as proposing a common and formal framework in analyzing text streams. We also defined three basic tasks including (1) online topic Detection, (2) event evolution extraction and (3) topic property life cycle, and proposed three common mining algorithms respectively. Finally we exemplify the application of Evolutionary Pattern Mining and shows that interesting patterns can be extracted in newswire dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A review of text mining approaches and their function in discovering and extracting a topic

Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling.  Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...

متن کامل

A Semantic Graph-Based Approach for Mining Common Topics from Multiple Asynchronous Text Streams

In the age of Web 2.0, a substantial amount of unstructured content are distributed through multiple text streams in an asynchronous fashion, which makes it increasingly difficult to glean and distill useful information. An effective way to explore the information in text streams is topic modelling, which can further facilitate other applications such as search, information browsing, and patter...

متن کامل

Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning

Topic modeling techniques have widespread use in text data mining applications. Some applications use batch models, which perform clustering on the document collection in aggregate. In this paper, we analyze and compare the performance of three recently-proposed batch topic models—Latent Dirichlet Allocation (LDA), Dirichlet Compound Multinomial (DCM) mixtures and von-Mises Fisher (vMF) mixture...

متن کامل

Mining maximal frequent itemsets from data streams

Frequent pattern mining from data streams is an active research topic in data mining. Existing research efforts often rely on a two-phase framework to discover frequent patterns: (1) using internal data structures to store meta-patterns obtained by scanning the stream data; and (2) re-mining the meta-patterns to finalize and output frequent patterns. The defectiveness of such a two-phase framew...

متن کامل

Efficiently Mining High Utility Sequential Patterns in Static and Streaming Data

High utility sequential pattern (HUSP) mining has emerged as a novel topic in data mining. Although some preliminary works have been conducted on this topic, they incur the problem of producing a large search space for high utility sequential patterns. In addition, they mainly focus on mining HUSPs in static databases and do not take streaming data into account, where unbounded data come contin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Multimedia

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014